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We investigate the use of iterated function system (IFS) models for data analysis. 
An IFS is a discrete dynamical system in which each time step corresponds to the 
application of one of a finite collection of maps. The maps, which represent distinct 
dynamical regimes, may act in some pre-determined sequence or may be applied in 
random order. An algorithm is developed to detect the sequence of regime switches 
under the assumption of continuity. This method is tested on a simple IFS and 
applied to an experimental computer performance data set. This methodology has a 
wide range of potential uses: from change-point detection in time-series data to the 
field of digital communications. 
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An IFS is a discrete dynamical system in which one member of a finite collection 
of maps acts at each time step. The maps, which represent distinct dynamical 
regimes of the overall system, may act randomly or in some pre-determined 
sequence. This is a useful framework for understanding the dynamics of a wide 
class of interesting systems, ranging from digital communication channels to 
the human brain. In this paper, we first review the IFS framework and then 
present an algorithm that leverages the associated properties to segment the 
signal. Working under the assumption that each of the maps is continuous, 
this algorithm uses topology to detect the different components in the output 
data. The main idea behind this approach is that nearby state-space points 
can evolve in different ways, depending on the dynamical regime of the IFS; the 
main challenge is that the components may overlap, causing their trajectories to 
locally coincide. We demonstrate this algorithm on two examples, a Henon IFS 
and an experimental computer performance data set. In both cases, we were 
able to segment the signals effectively, identifying not only the times at which 
the dynamics switches between regimes, but also the number and forms of the 
deterministic components themselves. 



I. INTRODUCTION 

Any approach to time series analysis begins with the question: is the data stochastic 
or deterministic; 12 ? Often, the answer may be "both": the data could be generated by a 
deterministic system with a noisy component, perhaps due to measurement or computer 
round-off error. In this article, we propose an alternative possibility: the data could be 
generated by a sequence of deterministic dynamical systems selected by a switching process 
that itself could be deterministic or stochastic, i.e., by an Iterated Function System (IFS). 
For a detailed review of IFS dynamics, see Diaconis and FreedmarP. If this were the case, 
then a useful goal is to identify the times at which switching between regimes occurs, as well 
as the number and forms of the deterministic components themselves. Under the assumption 
that each deterministic system is continuous, we use topology to detect and separate the 
components of the IFS that are present in the output data. The main idea behind this 
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approach is that the nearby state-space points can evolve in different ways, depending on 
the dynamical state of the IFS. Such a model has some relation to the determination of 
states in a hidden Markov model; however, hidden Markov models are typically discrete 
and stochastic — not continuous and deterministic. A primary challenge in this problem is 
that overlap between the images of distinct regime functions could cause their trajectories 
to locally coincide. The use of IFS models for physical systems is not new; for example, 
Broomhead, et alP used an IFS to model digital communication channels. In the current 
paper, we provide new tools to extract IFS models from experimental data and to determine 
the sequence of switching between regimes in the IFS. 

We believe the method proposed here will prove useful in a number of applications. For 
example, detection and separation of IFS components is closely related to the statistical 
problem of event or change-point detection® where time-series data is assumed to come from 
a statistical distribution that changes suddenly. Applications where change-point detection 
plays a role include fraud detection in cellular systems, intrusion detection in computer 
networks, irregular-motion detection in computer vision, and fault detection in engineering 
systems, among many others^. Our underlying hypothesis is different than in the field 
of statistical change-point detection — we assume that each regime is deterministic. For 
example, though change-point detection has been successfully applied to determine brain 
states from EEG datsP, EEGs have also been shown to exhibit properties of low-dimensional 
chaos 7 . Indeed, low-dimensional dynamics occurs in diverse areas including physiology, 
ecology, and economics^ES. We expect that the separation technique outlined below could 
be used to produce more-accurate models of regime shifts and the effects of rapid parameter 
changes that occur, e.g. in the onset of seizures, natural disasters, or the bursting of economic 
bubbles. 

II. DETECTION AND SEPARATION 

Given a time series that corresponds to measurements of a dynamical system, our goal 
is to develop a technique that will detect whether the series is generated by an Iterated 
Function System (IFS) and to distinguish its components. Formally, an IFS is a discrete- 
time dynamical system that consists of a finite set of maps {f , . . . , f n , . . . , /jv-i} on a state 
space X. A trajectory of the IFS is a sequence of state-space points, {x , . . . , x t , x t +i, ■ ■ .}, 
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together with a regime sequence {no, . . . ,n t , n t +i, . . .} with n t € {0, 1, . . . , N — 1}, such that 

Xt+l = fn t {x t ) , V£ G N. 

Without loss of generality, we may assume that each map occurs at least once in the regime 
sequence, since otherwise the missing maps could be eliminated. 

In the standard study of IFS dynamics, the regime sequence is often taken to be a real- 
ization of some random process^^; however, we only assume that we have access to a single 
trajectory that is generated by a particular realization. Consequently, the selection rule for 
the regime sequence is immaterial; indeed, it could just as well be a discrete, deterministic 
dynamical system. The standard theory, in addition, often requires that each f n is a con- 
traction mapping, in which case the IFS is hyperbolic and has a unique attractor A that is 
invariant in the sense that A = IJ^Tg 1 fn{A). We do not not need this assumption, and only 
assume that the trajectory lies in some bounded region of X. 

We will assume that the time series corresponds to T measurements on a particular 
state-space sequence, 

r = {x ,xi, . . . ,£ T _i}; (1) 

but that the regime sequence is unmeasurable or hidden. For example, one may be able to 
measure the position of a forced pendulum at a sequence of times, but the pendulum may 
have a sealed brake mechanism that sets a friction coefficient and that is controlled externally 
to the experiment. Measurement of Y also implicitly includes that of its associated shift map 

a(x t ) = x t+1 . (2) 

It is often the case that a time series corresponds to a limited measurement, perhaps of 
one variable from a multi-dimensional dynamical system. In this case, the first step is to 
use delay-coordinate embedding to construct, as much as is possible, a topologically faithful 
image of the orbit a reconstructed state-spac d 13 1 14 1 . We suppose that is this embedded 
time series. 

The fundamental goal in this paper is detection and separation: to detect if V is a trajec- 
tory of an IFS and to separate the regimes by recovering the sequence {nt}- This problem 
is relatively straightforward when T is a subset of some non-overlapping region of the IFS, 
i.e., a region R such that fi(R)f\fj(R) = for all i ^ j. In this paper, we address a 
more-general situation in which T could be sampled from an overlapping region of the IFS. 
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A fundamental requirement for our separation method is that the maps f n are continuous — 
a reasonable assumption for the vast majority of physical systems. In particular, the image 
of a connected set under each f n must be connected. Since for finite data sets, the notion of 
connectivity makes no sense, we will instead use e-connectivity under the assumption that 
X is a metric space with distance d(x,y). 

Definition (e-connectecP^). A setfl C X is e-connected if there exists ane-chain connecting 
the points in Q, i.e., for each pair of points x,y G Q there exists a sequence {z , . . . , z^} C Q 
such that x = zo, y = Zk, and d(zj, Zj+i) < e for < j < k — 1. 

Let Nk(x t ) denote the set of k points consisting of x t and its (k — l)-nearest neighbors in T. 
For each such set there will be a 5 such that Mk{x t ) is ^-connected. 

The idea of our algorithm is as follows. For each e that is not too small, there must be 
a k > 1 such that the image of N"k{ x t) under a single map will be e-connected. Indeed, 
continuity implies there is a 5 such that a 5-connected set has an e-connected image. For 
a given e, the minimal 5 will be determined by the maximal distortion of the map. For 
the algorithm to work, the set T must be dense enough so that for this 5, there are nearest 
neighbors, i.e., k > 1. 

If e is chosen to reflect this maximal, single-map distortion, then whenever the time- 
shifted image, o"(A4 consists of a number of e-connected components, each component 
should reflect the action of a different f n . This idea is expressed visually in Fig. [Tj Note 
that a(Afk(x t )) is NOT the same as J\fk(x t +i), the set of nearest neighbors to the image of 
x t - 

To obtain reasonable results the parameter e must be selected carefully as it will determine 
the maximal number of nearest neighbors, k. The number, N, of regimes of the IFS is 
not more than the maximal number of components of a{Mk{xt))- However, since sparsely 
covered portions of the data set could result in spurious components, we will select iV to be 
the number of components in the bulk of the images <r(A4(^t))- 

Given a time series Y that we suspect to be generated by an IFS, the detection and sep- 
aration algorithm requires an appropriate value for e. Here we outline a possible algorithm. 

• (Detection) Determine a value for e by computing histograms of the separations be- 
tween A/^^i) and a(J\f2(x t )). If T is sampled from a connected invariant set, then each 
of the nearest neighbor sets should be (^-connected. If there is more than one regime, 
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FIG. 1. Sketch of the action of the shift map a on a 7- nearest neighborhood of a point x\§ C T 
results in two e-connected components that can be identified as /o(A/V(xio)) and /iCA/V^io))- 

their images should be disconnected for some choice of e. The number of regimes N 
is estimated to be the number of components in the majority of the a(Afk(xf)); this 
should be persistent over a range of e and k values. 

• (Separation) Select a set of A-nearest-neighborhoods, {Qj = A/if(x t .)|j — 0,1, ... J — 
1}, that overlap and cover T. Points are identified to be from a common regime if they 
lie in overlapping neighborhoods and their images lie in e-connected components. 

In the next section, we illustrate this method on a simple example. 



III. EXAMPLE: A HENON IFS 



As a simple example, consider the IFS generated by the two quadratic, planar diffeomor- 
phisms 

f (x,y) = (y + l-lAx 2 , 0.3s), 

(3) 

f 1 (x,y) = (y + 1- 1.2(x - 0.2) 2 , -0.2s) . 

The map /o is Henon's quadratic map with the canonical choice of parameter values^, the 
map fx is conjugate, via an affine change of coordinates, to Henon's map with parameters 
(a, b) = (0.912, —0.2). We generate a single trajectory of this IFS by using a Bernoulli process 
with equal probability to generate a sequence n t € {0, 1}. A trajectory with T = 30,000 
points, shown in Fig. [2j has the appearance of two overlapping Henon-like attractors. Note 
however, that since most points on T are not iterated more than a couple of consecutive 
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steps with the same map, V is not just the union of the attractors of fo and f±. Indeed the 
attractor for fi is simply a fixed point at (0.63986, —0.12797). 




FIG. 2. A trajectory of the IFS generated by ^ with T = 30,000 points. Here rit G {0, 1} are 
chosen with equal probability. 

To recover the regime sequence from T we must check for e-disconnected images of S- 
connected components. As a first step to determine an appropriate value for e, we compute 
the distance between each point in T and its nearest neighbor, i.e., the diameter of A/^^t). 
This is shown in panel (a) of Fig. [3] as a histogram. Note that all but two points in T 
have a nearest neighbor within 0.02, and the vast majority within 0.002. Panel (b) of Fig. [3] 
indicates how these distances grow upon iteration: it shows the distance between the iterates 
of each of these nearest neighbors, i.e., the diameter of cr(.A/2 (:£*)). There are now two distinct 
distributions separated by a gap [0.02,0.032]. This suggests that the dynamics underlying 
T is discontinuous, and that a choice of e in the gap may be appropriate. 

Suppose that we did not know that the IFS ^ had two regimes — that only the trajectory 
T of Fig. [2] was available. To detect the number of regimes, we look at the number of e- 
components in the image of the sets of five nearest-neighbors, M^{xt). Histograms of the 
number of e-components of a{M^,{xt)) are shown in Fig. |4]as e varies from 0.005 to 0.05. The 
vast majority of these neighborhoods split into at most two e-components. When e is as 
small as 0.005 about 3% split into four or more components and when e > 0.02, only 0.3% 
split into three or more components. Note that with the equal probability rule that we used 
for (|3]), the probability that all five points in N§(xt) will be iterated with the same map is 
J|, which is confirmed in Fig. |4| since about 6% of the images have one e-component. 

Thus in the detection phase of the algorithm, we confirm that the underlying dynamics 
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FIG. 3. Distance between each point on T of Fig. [2] and its nearest neighbor (a) and between the 
images of these two points (b). 

has two regimes, N = 2 and obtain a reasonable choice, e = 0.03. 

For the separation phase of the algorithm, we wish to classify which points on T are images 
of which map. To do this we choose larger, overlapping neigbhorhoods that cover T so that 
we can connect the subsets for each regime. To distribute these neigbhorhoods, more-or- 
less evenly over T, we select J points {?/o,2/2, • • • ,Vj-i} by first choosing y G T arbitrarily, 
and subsequently incrementing j and selecting yj to be the point of T farthest from the 
previously selected points. Each selected point is the nexus of the .fT-nearest-neighborhood 

Qj = M K {yj)- 

We choose K = 40 and J = 10 4 , so that most of the flj overlap with other neighborhoods, 
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FIG. 4. Detection of the number of components of the images of sets of K = 5 nearest neighbors 
for the trajectory V of Fig. [2j The histograms show the number of the images { a (A/5 (xt)) | < 
t < 29, 999} that have N e-connected components for various e. 

in the sense that they share points in T. In this case, each of the Qj is 0.03-connected. 

The separation into regimes is accomplished as follows: whenever two "overlapping" Qj's 
have e-connected image components that intersect, we identify them as being generated by 
the same f n ] see the sketch in Fig. |5j More specifically, suppose that Qj y k is the set of points 
in Qj that generate the k th e-component of cr(f2j). These are distinguished by the following: 
whenever fl^^ ^^j 2 M then the union of their images a{$lj u k\) U ° r (^j2,k 2 ) wm share a 
point as well, and thus be e-connected. In this case, the points in these images are selected 
as being generated by the same regime f n . That is, f n (Qj. jk .) = <r(Qj uk J for i — 1,2. 

The flj^ can be thought of as nodes on an abstract graph. Whenever two of these 
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FIG. 5. Separation of the time series T into regimes. Here J7j and Qj represent the 8-nearest 
neighborhoods of m and i/j, respectively. They overlap, having xn in common. Each of the 
neighborhoods has two e-connected images under the shift a. The pair that share a{x\j) = x\% are 
identified to be in the same regime, say n = so the 8 points in this e-connected image set (and 
their preimages) are colored blue to indicate the common regime. 

neighborhoods share a point, an edge linking these nodes is added to the graph. Using this 
construction, the connected components of the resulting graph are selected as images of a 
fixed regime. Of course, we do not know which of the / n 's is associated with which graph 
component unless we have prior knowledge of some values of the functions. 

For the trajectory of Fig. [2] and using the covering by the 10 4 neighborhoods Qj, this 
algorithm generates two large connected graph components, one containing 14, 724 points 
and the other 14, 815 points. These sets of points are shown in the panels (a) and (b) of 
Fig. [6j respectively. Comparing these results with the known values of n t shows that every 
point in the first graph component has n t = and every point in the second has n t = 1; 
that is, both the separation had no false positives. There are an additional 465 points of T 
that are not in these two graph components. These unidentified points represent sparsely 
visited regions of the trajectory. 

It is no coincidence that the points identified to be images of f in Fig. [6^a) appear to lie 
close to the attractor of the standard Henon map, which is shown in grey (light red, online) 
in the figure. Note, however, that even though the attractor for fi is a fixed point — the cross 
in the figure — the strong perturbation due to fo iterations causes the points in Fig. |6](b) to 
range far from the attractor of f\. 
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FIG. 6. Panel (a) and panel (b) show the points identified as iterates of /o and /i, respectively. 
These points can be approximately interpreted as a sampling of /o(r) (/i(r)) Also shown, in grey 
(colored red, online) are points on the attractor of the Henon map /q, and a cross at the position 
of the fixed point of f\. 

IV. COMPUTER PERFORMANCE DYNAMICS 

In this section, we describe the application of the regime separation algorithm to a time 
series obtained from an experimentally obtained computer performance analysis data set. A 
critical performance bottleneck in modern computer systems occurs in the efficient manage- 
ment of memory. The cache is the level of memory closest to the processor; it is preloaded 
with the data that the system thinks it will need. When the system looks for a necessary 
piece of data in the cache and does not find it, it must load the data from main memory, 
resulting in a major performance slowdown. Such an event is called a cache miss. 

The experiment to investigate the frequency of cache misses consists of repeatedly running 
the simple C program: 

for(i = 0; i < 255; i++) 
for(j = i; j < 255; j++) 
data [ i ] [ j ] = ; 

on an Intel Core2® processor. This code initializes the upper triangular portion of a matrix 
in row-major order. As the program runs, the hardware performance monitors built into the 
processor monitor the memory usage patterns — in particular, the rate of cache misses. The 
program is interrupted every 10 5 instructions and the number of cache misses that occurred 
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over that interval is recorded. We obtained a time series consisting of 86,107 points, from 
which we used a representative 60,000 point segment for the work presented in this paper. 
A snippet of the time series, along with the first return map of the 60,000 points selected 
is shown in Fig. [7j This data set has been studied previously and shown to exhibit chaotic 
dynamic ^ 17 1 18 ! 




(a) 




time (instructions x 100,000) 



(cache misses), 



FIG. 7. (a) Cache misses per 10 5 instructions observed during the execution of the code above. 



(b) First return map of the time series from Fig. 7(a 



The time delay embedding process we use to analyze the computer performance trace 
involves the estimation of two parameters: the time delay r and the embedding dimension 
m. We first choose r using standard practice d 13 * 14 !. Based on the first minimum of the time- 
delayed average mutual information we choose r = 10 5 instructions. After choosing the 
delay r, the next step in the embedding process is to estimate the dimension m. A standard 
strategy for this is the "false nearest neighbor" approach of Abarbanel and KenneW Based 
on this algorithm, we estimate that 10 < m < 25, then narrow down that range to m = 12 
using other dynamical invariants^. Note that the first return map shown in Fig. 7(b) 



is 

12 



simply a two-dimensional projection of the 12-dimensional state space embedding, T C 
For a more detailed discussion of these choices and our approach to estimating them see 
Garland and Bradley 20 or Mytkowicz, et al.^. 



The observation of the ghost triangles in Fig. 7(b) — seemingly reminiscent of three over- 
lapping attractors from an IFS — prompted us to apply our regime-separation algorithm to 
this data. Because the two ghosts are much more lightly sampled than the "main" triangle, 
our conjecture was that the IFS consisted of three functions and that the switching process 
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FIG. 8. Distance between each point on T — the 12-dimensional time-delay embedding of the 
computer performance data — and its nearest neighbor (a) and between the images of these two 
points (b). 



prioritized one of the three. 

For the analysis, we chose e = 75. This choice is justified by observation of the histograms 
shown in Fig. [8j Figure J8^a) is the histogram of the distances between each point in the 
time-series embedding and its nearest neighbor, i.e., the diameter of Mzixj) in IR 12 . Figure 
(8Tb), shows the histogram of distances between the same two points after iterating them 
both forward one time step, i.e., the diameter of a(N2(xj)). The two 'humps' indicate that, 
for a generic pair of nearest neighbors, the images of those neighbors are either within e w 75 
of each other (iterates of the same fj), or further than e w 75 apart (iterates of different fj). 

We separated the ghosts as follows: For each Xj G T, we examine the image of Xj and 
its nine nearest neighbors. That is, if a(J\fio(%j)) consists of two e-connected components, 
then we identify the members of the smaller component (in cardinality) as candidates for 
points on the ghost triangle. Hence, we have a sequence J = {j±, . . . such that Xj i G Y 
is identified as a point on the ghost for 1 < % < K. These points correspond to the lower 



ghost of Fig. 7(b) the second ghost is just an image of the first — a necessary result of the 
symmetry inherent in the time-delay embedding process. Furthermore, we note that the 
occurrence of ghost points is strongly periodic, with a period of 215 measurements. This 
claim is motivated by the plot in Fig. [9j This plot is the first difference of the sequence J, 
i.e., the point (i,n) indicates that the (i + l) s * ghost occurs n measurements after the i th 
ghost. Furthermore, the points on this graph that fall below the line — ji = 215 are the 
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FIG. 9. The first difference of the sequence J = {ji, ■ ■ ■ , jx} of ghost indices. That is Xj i is 
identified as a probably ghost point for 1 < i < K. A point (i,n) on the graph indicates that 
ji+i - ji = n. 

result of an 'extra' identification in the middle of a period. For example, the first two such 
points have coordinates (42, 158) and (43, 57). Since 158 + 57 = 215, one might hypothesize 
that the 42 nd ghost identification is actually spurious. 
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(cache misses), (cache misses), 

FIG. 10. (a) The lower ghost triangle separated from the data of Fig. |7(b)| Each of the 295 points 
in this plot was identified as being e-disconnected (e = 75) from the images of an e-connected set 
of points, (b) A two dimensional time-delay embedding of the adjusted time series obtained by 
adding 200 cache misses to the time-series values corresponding to each of the points from (a). 

This analysis reinforces the hypothesis that there is a direct correspondence between 
the points on the ghosts and points in the time series that are periodically spaced by 215 
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measurements. Moreover, each ghost point appears to be shifted exactly 200 cache misses 
from the main triangle. Indeed, adding 200 cache misses to each of the points identified as 



parts of a ghost triangle, produces a time series that has the embedding shown in Fig. 10(b) 



Thus, for this case, not only is the regime identified, but the dynamics of the two components 
is shown to be simply related: by just a shift. 

There is one issue remaining before we can model this data as an IFS: we only have access 
to measurements of the states of the IFS. That is, if X is the state space of the computer 
system and {fo, • • • ,fk} is a collection of maps on X, the observations correspond to the 
functions {h o / , . . . ,h o f k } with a continuous measurement function h : X — > R that maps 
the state of the computer system to the number of cache misses that occur over the given 
time interval. It can be shown that the functions h o fa are sufficient for studying topological 
and geometric properties of the a more-detailed treatment of the function h can be found 
in Alexander et alP^. 

Thus, if h o f denotes the dynamics associated with the main triangle, we can define 
h o f^x) = ho f (x) - 200, and h o f 2 (x) = ho f (y), where y e h-\h{x) + 200). The 
IFS consists of the state space X, the collection {fo, fi, f 2 } of continuous maps, and the 
sequence {rij} C {0, 1,2}, where 

z 

1 if .7 = mod 215 

2 if j = 1 mod 215 
otherwise 



n 3 



This model of the the cache-miss dynamics on the Intel Core2® rests on the assumption 
that fi and f 2 can be described completely in terms of fo- To verify this assumption, we 



tested for determinism in the adjusted dynamical system of Fig. 10(a) We found that out of 
the 295 points so identified, only 16 failed to lie in an e-connected image set in the adjusted 
dynamical system. Consequently, fo appears to be a continuous function and the IFS is an 
accurate model for this data set. 

Much of the usefulness of this model originates from the fact that we have isolated the 
continuous function fo. In light of this, it is reasonable to assume that fo is representa- 
tive of some low- dimensional dynamics that are present in the computer system, while fi 
and f 2 represent a secondary piece of dynamics — in this case, perhaps best described as 
'deterministic additive noise'. 
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V. CONCLUSION 



Many techniques for time-series analysis, such as in Mischaikow et al.^, explicitly require 
the time series to be generated by a continuous function, and almost all of them implicitly 
require that it be generated by a single function. For example, in Mytkowicz et al.^, time- 



series analysis of the data of Fig. 7(a) showed that it has a positive Lyapunov exponent and 
fractal correlation dimension. However, our results show that that time series interleaves 
trajectories from different dynamical systems — a property that can trip up traditional time- 
series analysis techniques. In the data studied in Mytkowicz et alP^, this proved not to be 
an issue because a single /o overwhelmingly dominated the dynamics. When that is not 
the case, problems can arise with traditional methods, which are often formulated assuming 
the existence of long, uninterrupted deterministic trajectories. Some techniques, such as 
those in Mischaikow et al.^, do not explicitly require uninterrupted trajectories. Using our 
topology-based approach, one could pull apart the time-series data into individual regimes 
and study the dynamics of each of the fi independently. 

In conclusion, we have described an algorithm for detection and separation of a signal 
that is generated by continuous, deterministic dynamics punctuated by regime shifts. The 
algorithm handles shifts that result from stochastic or deterministic processes: it applies 
whenever the dynamics are described by an iterated function system. Time-series data from 
a computer performance analysis experiment were shown to fit this model. More generally, 
we claim that iterated function systems are a natural model for complex computer programs, 
which — we hypothesize — have regime shifts as their execution moves through different parts 
of the code. 

IFS models provide a natural framework for data analysis in a wide range of fields: 
whenever the physical system generating the data is characterized by continuous systems, 
that are punctuated by discontinuous regime shifts. Another area in which the regime 
separation technique is particularly appropriate is digital communication channels. Indeed, 
(hyperbolic) iterated function systems are known to provide useful models of these channels^. 
A channel corresponds to an electrical circuit externally driven by a digital signal, and the 
discrete input signal corresponds to the regime sequence. Thus, the behavior of the circuit 
corresponds to the actions of a discrete set of continuous dynamical systems. A fundamental 
problem in this context is channel equalization, the reversal of distortion that is incurred 
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by transmission through a channel. This is precisely the determination of the input signal 
sequence from a sequence of output values — i.e., regime separation. Channel equalization 
is straightforward for linear dynamics because the IFS attractors in these situations tend 
to be non-overlapping. However, more-realistic, nonlinear IFS models have overlapping 
attractors. We believe that our methods can be successfully used for channel equalization 
in this context. 

Challenges that remain to be addressed include finding an efficient implementation for 
high-dimensional data and dealing with systems that have traditional (e.g. Gaussian) noise 
in addition to regime shifts. Furthermore, we have not addressed the nature of the switching 
process itself. Once the regime shifts have been determined, the next natural question to 
ask is whether or not the switching is deterministic or stochastic, and if one can determine 
the rule for switching between regimes. 
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